AITopics | labelled data

Collaborating Authors

labelled data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bayesian Semi-supervised Learning with Graph Gaussian Processes

Yin Cheng Ng, Nicolò Colombo, Ricardo Silva

Neural Information Processing SystemsFeb-12-2026, 09:51:50 GMT

Neural Information Processing Systems http://nips.cc/

acquisition function, gaussian process, learning, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Overview (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Mitigating the Antigenic Data Bottleneck: Semi-supervised Learning with Protein Language Models for Influenza A Surveillance

Xu, Yanhua

arXiv.org Artificial IntelligenceDec-8-2025

Influenza A viruses (IAVs) evolve antigenically at a pace that requires frequent vaccine updates, yet the haemagglutination inhibition (HI) assays used to quantify antigenicity are labor-intensive and unscalable. As a result, genomic data vastly outpace available phenotypic labels, limiting the effectiveness of traditional supervised models. We hypothesize that combining pre-trained Protein Language Models (PLMs) with Semi-Supervised Learning (SSL) can retain high predictive accuracy even when labeled data are scarce. We evaluated two SSL strategies, Self-training and Label Spreading, against fully supervised baselines using four PLM-derived embeddings (ESM-2, ProtVec, ProtT5, ProtBert) applied to haemagglutinin (HA) sequences. A nested cross-validation framework simulated low-label regimes (25%, 50%, 75%, and 100% label availability) across four IAV subtypes (H1N1, H3N2, H5N1, H9N2). SSL consistently improved performance under label scarcity. Self-training with ProtVec produced the largest relative gains, showing that SSL can compensate for lower-resolution representations. ESM-2 remained highly robust, achieving F1 scores above 0.82 with only 25% labeled data, indicating that its embeddings capture key antigenic determinants. While H1N1 and H9N2 were predicted with high accuracy, the hypervariable H3N2 subtype remained challenging, although SSL mitigated the performance decline. These findings demonstrate that integrating PLMs with SSL can address the antigenicity labeling bottleneck and enable more effective use of unlabeled surveillance sequences, supporting rapid variant prioritization and timely vaccine strain selection.

artificial intelligence, machine learning, sequence, (16 more...)

arXiv.org Artificial Intelligence

2512.05222

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

Informative missingness and its implications in semi-supervised learning

Wu, Jinran, Wang, You-Gan, McLachlan, Geoffrey J.

arXiv.org Machine LearningDec-5-2025

Semi-supervised learning (SSL) constructs classifiers using both labelled and unlabelled data. It leverages information from labelled samples, whose acquisition is often costly or labour-intensive, together with unlabelled data to enhance prediction performance. This defines an incomplete-data problem, which statistically can be formulated within the likelihood framework for finite mixture models that can be fitted using the expectation-maximisation (EM) algorithm. Ideally, one would prefer a completely labelled sample, as one would anticipate that a labelled observation provides more information than an unlabelled one. However, when the mechanism governing label absence depends on the observed features or the class labels or both, the missingness indicators themselves contain useful information. In certain situations, the information gained from modelling the missing-label mechanism can even outweigh the loss due to missing labels, yielding a classifier with a smaller expected error than one based on a completely labelled sample analysed. This improvement arises particularly when class overlap is moderate, labelled data are sparse, and the missingness is informative. Modelling such informative missingness thus offers a coherent statistical framework that unifies likelihood-based inference with the behaviour of empirical SSL methods.

information, mechanism, missingness, (15 more...)

arXiv.org Machine Learning

2512.04392

Country:

Oceania > Australia > Queensland (0.04)
Asia > China > Hunan Province (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Bayesian Semi-supervised Learning with Graph Gaussian Processes

Yin Cheng Ng, Nicolò Colombo, Ricardo Silva

Neural Information Processing SystemsNov-20-2025, 14:58:44 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Overview (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Evolutionary Machine Learning meets Self-Supervised Learning: a comprehensive survey

Vinhas, Adriano, Correia, João, Machado, Penousal

arXiv.org Artificial IntelligenceNov-5-2025

The number of studies that combine Evolutionary Machine Learning and self-supervised learning has been growing steadily in recent years. Evolutionary Machine Learning has been shown to help automate the design of machine learning algorithms and to lead to more reliable solutions. Self-supervised learning, on the other hand, has produced good results in learning useful features when labelled data is limited. This suggests that the combination of these two areas can help both in shaping evolutionary processes and in automating the design of deep neural networks, while also reducing the need for labelled data. Still, there are no detailed reviews that explain how Evolutionary Machine Learning and self-supervised learning can be used together. To help with this, we provide an overview of studies that bring these areas together. Based on this growing interest and the range of existing works, we suggest a new sub-area of research, which we call Evolutionary Self-Supervised Learning and introduce a taxonomy for it. Finally, we point out some of the main challenges and suggest directions for future research to help Evolutionary Self-Supervised Learning grow and mature as a field.

artificial intelligence, evolutionary algorithm, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2504.07213

Country:

Europe (0.67)
North America > United States > Minnesota (0.28)
North America > United States > California (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
(2 more...)

Add feedback

AnomalyMatch: Discovering Rare Objects of Interest with Semi-supervised and Active Learning

Gómez, Pablo, Ruhberg, Laslo E., Nardone, Maria Teresa, O'Ryan, David

arXiv.org Artificial IntelligenceOct-31-2025

Anomaly detection in large datasets is essential in astronomy and computer vision. However, due to a scarcity of labelled data, it is often infeasible to apply supervised methods to anomaly detection. We present AnomalyMatch, an anomaly detection framework combining the semi-supervised FixMatch algorithm using EfficientNet classifiers with active learning. AnomalyMatch is tailored for large-scale applications and integrated into the ESA Datalabs science platform. In this method, we treat anomaly detection as a binary classification problem and efficiently utilise limited labelled and abundant unlabelled images for training. We enable active learning via a user interface for verification of high-confidence anomalies and correction of false positives. Evaluations on the GalaxyMNIST astronomical dataset and the miniImageNet natural-image benchmark under severe class imbalance display strong performance. Starting from five to ten labelled anomalies, we achieve an average AUROC of 0.96 (miniImageNet) and 0.89 (GalaxyMNIST), with respective AUPRC of 0.82 and 0.77. After three active learning cycles, anomalies are ranked with 76% (miniImageNet) to 94% (GalaxyMNIST) precision in the top 1% of the highest-ranking images by score. We compare to the established Astronomaly software on selected 'odd' galaxies from the 'Galaxy Zoo - The Galaxy Challenge' dataset, achieving comparable performance with an average AUROC of 0.83. Our results underscore the exceptional utility and scalability of this approach for anomaly discovery, highlighting the value of specialised approaches for domains characterised by severe label scarcity.

anomaly, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.03509

Country: Europe (0.67)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Semi-supervisedly Co-embedding Attributed Networks

Zaiqiao Meng, Shangsong Liang, Jinyuan Fang, Teng Xiao

Neural Information Processing SystemsOct-2-2025, 02:11:37 GMT

Neural Information Processing Systems http://nips.cc/

latexit sha1, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Identifying birdsong syllables without labelled data

Teng, Mélisande, Boussard, Julien, Rolnick, David, Larochelle, Hugo

arXiv.org Artificial IntelligenceSep-27-2025

Identifying sequences of syllables within birdsongs is key to tackling a wide array of challenges, including bird individual identification and better understanding of animal communication and sensory-motor learning. Recently, machine learning approaches have demonstrated great potential to alleviate the need for experts to label long audio recordings by hand. However, they still typically rely on the availability of labelled data for model training, restricting applicability to a few species and datasets. In this work, we build the first fully unsupervised algorithm to decompose birdsong recordings into sequences of syllables. We first detect syllable events, then cluster them to extract templates -- syllable representations -- before performing matching pursuit to decompose the recording as a sequence of syllables. We evaluate our automatic annotations against human labels on a dataset of Bengalese finch songs and find that our unsupervised method achieves high performance. We also demonstrate that our approach can distinguish individual birds within a species through their unique vocal signatures, for both Bengalese finches and another species, the great tit.

artificial intelligence, machine learning, template, (19 more...)

arXiv.org Artificial Intelligence

2509.18412

Country: North America > Canada (0.28)

Genre: Research Report (0.64)

Industry:

Media > Music (0.50)
Leisure & Entertainment (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Flick: Few Labels Text Classification using K-Aware Intermediate Learning in Multi-Task Low-Resource Languages

Almutairi, Ali, Alsuhaibani, Abdullah, Jameel, Shoaib, Naseem, Usman, Mohammadi, Gelareh, Razzak, Imran

arXiv.org Artificial IntelligenceJun-13-2025

Training deep learning networks with minimal supervision has gained significant research attention due to its potential to reduce reliance on extensive labelled data. While self-training methods have proven effective in semi-supervised learning, they remain vulnerable to errors from noisy pseudo labels. Moreover, most recent approaches to the few-label classification problem are either designed for resource-rich languages such as English or involve complex cascading models that are prone to overfitting. To address the persistent challenge of few-label text classification in truly low-resource linguistic contexts, where existing methods often struggle with noisy pseudo-labels and domain adaptation, we propose Flick. Unlike prior methods that rely on generic multi-cluster pseudo-labelling or complex cascading architectures, Flick leverages the fundamental insight that distilling high-confidence pseudo-labels from a broader set of initial clusters can dramatically improve pseudo-label quality, particularly for linguistically diverse, low-resource settings. Flick introduces a novel pseudo-label refinement component, a departure from traditional pseudo-labelling strategies by identifying and leveraging top-performing pseudo-label clusters. This component specifically learns to distil highly reliable pseudo-labels from an initial broad set by focusing on single-cluster cohesion and leveraging an adaptive top-k selection mechanism. This targeted refinement process is crucial for mitigating the propagation of errors inherent in low-resource data, allowing for robust fine-tuning of pre-trained language models with only a handful of true labels. We demonstrate Flick's efficacy across 14 diverse datasets, encompassing challenging low-resource languages such as Arabic, Urdu, and Setswana, alongside English, showcasing its superior performance and adaptability.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.10292

Country:

Europe (0.46)
Oceania > Australia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.86)

Add feedback

Overcoming Data Scarcity in Scanning Tunnelling Microscopy Image Segmentation

Kolev, Nikola L., Trouton, Max, Canova, Filippo Federici, Thornton, Geoff, Gao, David Z., Curson, Neil J., Stock, Taylor J. Z.

arXiv.org Artificial IntelligenceJun-3-2025

Scanning tunnelling microscopy (STM) is a powerful technique for imaging surfaces with atomic resolution, providing insight into physical and chemical processes at the level of single atoms and molecules. A regular task of STM image analysis is the identification and labelling of features of interest against a uniform background. Performing this manually is a labour-intensive task, requiring significant human effort. To reduce this burden, we propose an automated approach to the segmentation of STM images that uses both few-shot learning and unsupervised learning. Our technique offers greater flexibility compared to previous supervised methods; it removes the requirement for large manually annotated datasets and is thus easier to adapt to an unseen surface while still maintaining a high accuracy. We demonstrate the effectiveness of our approach by using it to recognise atomic features on three distinct surfaces: Si(001), Ge(001), and TiO$_2$(110), including adsorbed AsH$_3$ molecules on the silicon and germanium surfaces. Our model exhibits strong generalisation capabilities, and following initial training, can be adapted to unseen surfaces with as few as one additional labelled data point. This work is a significant step towards efficient and material-agnostic, automatic segmentation of STM images.

artificial intelligence, machine learning, segmentation, (18 more...)

arXiv.org Artificial Intelligence

2506.01678

Country: Europe (1.00)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback